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Abstract 

The problem of filtering of finite-alphabet stationary ergodic time se- 
ries is considered. A method for constructing a confidence set for the 
(unknown) signal is proposed, such that the resulting set has the follow- 
ing properties: First, it includes the unknown signal with probability 7, 
where 7 is a parameter supplied to the filter. Second, the size of the 
confidence sets grows exponentially with the rate that is asymptotically 
equal to the conditional entropy of the signal given the data. Moreover, 
it is shown that this rate is optimal. We also show that the described 
construction of the confidence set can be applied for the case where the 
signal is corrupted by an erasure channel with unknown statistics. 

1 Introduction 

The problem of estimating a discrete signal Xi, . . . , X t from a noisy version 
Z\ , . . . , Z t has attracted attention of many researchers due to its great impor- 
tance for statistics, computer science, image processing, astronomy, biology, 
cryptography, information theory and many other fields. The main attention 
is usually focused on developing methods of estimation (denoising, or filtering) 
of the unknown signal, with the performance measured under a given fidelity 
criterion; see [5J [3] and references therein. Such an approach is close in spirit to 
the problem of point estimation in statistics. 

An alternative approach, often considered in mathematical statistics, is that 
of constructing confidence sets. That is, one tries to use the data to construct 
a set that includes the unknown parameter (in our case, the signal) with a 
prescribed probability, while trying to keep the size of the set as small as possible 
(some classical examples of the use of this method in statistics can be found in, 
e -g-i ED- Such a set is usually constructed as the set of most likely values of 
the parameter. 

The reason why such an approach is of interest is as follows. In the presence 
of noise, the exact recovery of the signal is typically impossible, and thus, in 
such cases, any of its estimates is necessarily imperfect. The choice of a partic- 
ular estimate of the signal out of many likely alternatives is largely arbitrary. 
Moreover, the optimal choice may depend on the specific application involved. 
The confidence-set approach effectively abstracts from the problem of choosing 
the "best" estimate, proposing, instead, a set of estimates. The performance 



1 



of a method is then characterized by the size of the confidence set (depending 
on the confidence level). This is the approach and the problems considered in 
this work. We consider a model in which the underlying noiseless signal and 
the resulting corrupted (noisy) signal (and thus the channel) are assumed to be 
stationary ergodic processes with finite alphabets. We mainly concentrate on 
the case where the probability distributions of the noiseless signal and the noisy 
channel are known. (Obviously, in such a case the distribution of the corrupted 
signal is known, too.) Besides, the case of a erasure channel with unknown dis- 
tribution is briefly mentioned, because in this case a conditional distribution of 
noiseless signal is known even though the distribution of the noise is unknown. 
The results that we obtain establish the optimal rate of growth (with respect to 
time, or to the length of the signal) of the size of the confidence set, as well as 
a method for constructing such a set. The optimal rate turns out to be equal 
to the entropy of the signal given its noisy version. 

Let us consider an example that illustrates our approach and exposes the 
notation. Let the signal be binary (with the alphabet {0, 1}), and suppose that 
it is transmitted through a memoryless binary erasure channel (e.g. pQ). The 
binary erasure channel with erasure probability ir is defined as a channel with 
binary input, ternary output (with the alphabet {0, 1, *}), and the probability 
of erasure ir. The channel replaces each input symbol or 1 with the (output) 
symbol * with probability n (erasure), and places the input signal in the output 
otherwise (that is, with probability 1 — tt). 

Suppose that the noiseless sequence is generated by an i.i.d. source P and 
P{Xi = 0} = 0.9, and let the erasure probability be any 7r e (0,1), i.e. the 
erasure probability is unknown. Suppose that the corrupted by noise sequence 
is as follows: 

Z X ...Z 4 = 0*1* . 

Then we have the following probability distribution for the lossless signal: 
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= 0011}) 


= 0.09, 


P({Xi. 


..x 4 


= 0111}) 
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If we take the confidence level 7 = 0.99, the confidence set will contain three 
following sequences: {0010, 0110, 0011}. 

The goal of this paper is to describe a construction of confidence sets and 
to give an estimate of their size, for the case when the signal and noise are 
stationary ergodic processes with finite alphabets. It is shown that for any 
7 G (0, 1) the size of the confidence set grows exponentially with the rate h(X\Z), 
where h(X\Z) is the limit (conditional) Shannon entropy. This result is valid 
for the case when the probability distributions of noiseless signal and noise are 
known as well as for the case when the probability distribution of the signal is 
known and the noise is described by a stationary erasure channel with memory 
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whose probability distribution is unknown. Moreover, we prove that the rate 
h(X\Z) is minimal, which means that the suggested method of constructing 
confidence sets is asymptotically optimal. 

It is worth noting that the information theory is deeply connected with 
statistics of time series and signal processing; see, for example, [TJ [H [7J QUI 
[TT1 [HJ [T5] and [HI HI H] , correspondingly. In this paper a new connection of 
this kind is established: it is shown that the Shannon entropy determines the 
rate of growth of the size of the confidence set for the signal, given its version 
corrupted by stationary noise. 

2 The confidence sets and their properties 

We consider the case where the signal X — X\ , X-i , . . . and its noisy version Z = 
Z\, Z 2 , ■ ■ ■ are described by stationary ergodic processes with finite alphabets X 
and Z respectively. (There may be arbitrary long-range dependencies between 
the variables.) It is assumed that probability distributions of both processes are 
known, and, hence, the statistical structure of the noise corrupting the signal 
X = Xi,X2,.-- is known, too. Introduce the short-hand notation X\„t f° r 
Xi, . . . , Xt, and analogously for Z. 

Informally, for any 7 £ (0, 1) and any sequence Zi, . . . , Zt we define the confi- 
dence set ^f*(Z\, Z2, ■ • ■ , Zt) as follows: the set contains sequences xx, X2, ■ ■ ■ , xt 
whose probabilities P{x\..t\Z\..t) are maximal and sum to 7. This definition is 
not precise, since it is possible that the sum can not be made equal to 7 exactly. 
That is why the formal definition of the confidence set will use randomization. 

For this purpose, we order all sequences Xi t according their conditional 
probabilities, in the decreasing order. That is, enumerate all sequences X\„t £ 
X™ in such a way that (fli..t) € X 4 has a smaller index than (i>i..t) 6 X* 
if either P(ai.. t |Z 1 .. t ) > P{h..t\Z\..t), or P(ai„ t |Zi„ t ) = P{bi.. t \Zi.. t ) and 
(ai..t) is lexicographically less than (61. .t). Let j be the integer for which 
JXl P{x\..t\Zi-t) < 1 and ELi PfriJZi^ > 7. If ECi 1 PixijZL.t) = 7, 
then define \&* (Zi..t) as the set {x{ t , . . . , x{ t }. Otherwise, ^* {Z\..t) also con- 
tains j — 1 first elements, and additionally the element x{ t with probability 
(7 — J2iZi P{ x \..t\Zi..t))/ P(x{ j.\Zi.. t ). (Note that this procedure is commonly 
used in mathematical statistics for making the confidence level exactly 7.) When 
talking about the sizes of the confidence sets we refer to their expected (with 
respect to the randomization) size. 

Next, we estimate the size of the described confidence set. 

Theorem 1. Let an (unknown) signal X = X1X2, ■ ■ ■ and its noisy version 
Z = Z1Z2, ■ ■ ■ be stationary ergodic processes with finite alphabets. Then, 
for every 7 £ (0, 1), all t £ N and almost every Z\, . . . , Z% the confidence set 
Wl{Z\, . . . , Zt) contains the unknown (X\, . . . , Xt) with probability 7." 

P{X 1 ..t£-*\{Z 1 .. t )} = 1 , (1) 
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while, with probability 1, the size of the set ^ (Z±, . . . , Z t ) grows exponentially 
with the exponent rate that is equal to the conditional entropy: 

lim \ \ogV\* t JZ 1 ,...,Z t )\ = h{X\Z) a.s., (2) 

where the expectation is with respect to the randomization used in constructing 
the confidence sets. 

Proof. The proof of ([1} immediately follows from the construction of the set 
%{Z l Z 2 ...Z t ). 

The proof of ([2]) will be based on the Shannon-McMillan-Breiman theorem 
[IJ|3], which for the conditional entropy implies the following: 

Lemma 1 (Shannon-McMillan-Breiman). Ve > 0,V5 > 0, for almost all 
Z\, Z%, . . . there exists n' such that if n > n' then 



P 



— logP(X 1 .. n \Z 1 .. n )-h{X\Z) 
n 



< e } > 1 - S. (3) 



Take any e > and any S > such that 

1 - 8 > 7. (4) 

According to the lemma, for almost all Z\ , Z2, ■ ■ ■ there exists n' such that |3j) 
is valid if n > n . Take any such n and rewrite ([3]) as follows: 

P j 2 -"( /i ( x l z )+ £ ) < P(X 1 .. n \Z 1 .. n ) < 2-™(' l ( x l z )- £ ) J > 1 - 5. (5) 

Thus, the probability of all strings x±, . . . , x n for which we have P(x\.. n \Zi_ n ) > 
2 -n(h{x\z)+e) ig at leagt ^ _ Taking into account (gj), we have 

\*y(Zi.. n )\ < 1 /2- n ^ x \ zs >+z\ 

so that 

- log \^JZ x .. n )\ < h(X\Z) + e + 0(l/n) (6) 
n ' 

for n > n' . Having taken into account that ([5]) holds for every small e > we 
obtain ©. □ 



3 Optimality of the confidence set 

Theorem 2. Let an (unknown) signal X = X\X 2l ■ ■ ■ and its noisy ver- 
sion Z = Z\Z%, . . . be stationary ergodic processes with finite alphabets X and 
Z. Let $^(Zi.. t ), be confidence sets, such that for some 7 £ (0,1) we have 
P (Xi..t G $* (Zi.. t )) > 7 for all t E N and almost all Z x ...t G Z*. Then, with 
probability 1, 

limiirfilog|S*(Z l ,...,Z t )| > h(X\Z). (7) 

t->oo t ' 
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Proof. The proof will use the Shannon-McMillan-Brciman theorem (JSJ) - As 
before, we take any e > and fix 6 := 7/2. Then from some n on we have ([5]). 
Let T be a confidence set for this n and a certain 7. Define 

$ = [x x „ n : 2-"^™+^ < P(a:i.. n |^i.. n ) < 2-"("™- £ ) } . (8) 

By definition, ex P{ x i..n\Zi.. n ) > 7- From this and ([5]) we obtain 

^ P^l..n|^l..n)>7-<5. 

x-L..„eTn® 
From this and (J5J) we get 

|T| > |T n $| > (7 - S)2 n ^ ht - x ^- e l 

Hence, 

liminf - log ITI > h(X\Z) - e. 

t^oo n 

Since this inequality is true for any confidence set T and any e > 0, we obtain ([?]) ■ 

□ 

4 Erasure channel with unknown statistics 

In this section we consider the case when the channel statistics is unknown, but 
the channel has a specific form: it is an erasure channel for which probabilities 
to be erased are equal for all symbols. We show that the described above 
confidence set is asymptotically optimal in this case, too. The point is that 
in this the conditional probabilities P(X\_ n / Z\_ n ) are known, that is why the 
construction of the previous section is directly applicable. 

The formal description of the considered model is as follows. We still assume 
that there is a known stationary ergodic source generating the signal X\, X2, .... 
The erasure channel is defined in two following steps: first, there is a stationary 
ergodic process 9 generating letters from the alphabet {A, *} and, second, the 
noisy channel is determined by the following "summation" of the (uncorrupted) 
sequence X\, X%, . . , and the noise sequence Q%, 62, • • • : 




X l if Qi = A 
* if 0,- = *. 



Theorem 3. Let an (unknown) signal X = XiX 2 , ■ ■ ■ and Z\,Zi,... be a 
stationary ergodic signal and its version corrupted by an unknown stationary 
erasure channel. Then, for every 7 £ (0 ; 1)> all t E N and almost every 
Z\, . . . ,Zt the (above described) confidence set ^(Zi, . . . , Z t ) contains the un- 
known (Xi, . . . , X t ) with probability 7: 

P{Xt.. t £ % (Zi.. t )} = 7, (9) 



5 



while, with probability 1, the size of the set ^ (Z±, . . . , Z t ) grows exponentially 
with the exponent rate that is equal to the conditional entropy: 

lim \ \ogV\$ t JZ 1 ,...,Z t )\ = h(X\Z) a.s., (10) 

where the expectation is with respect to the randomization used in constructing 
the confidence sets. 

Proof. It is enough to notice that, although the erasure channel is not known, 
the probabilities P(Xi_ n \Z 1 _ n ) are known. Therefore, the proof of this theorem 
is identical to that of Theorem [1] □ 

5 Discussion 

To the best of our knowledge, the problem of constructing a confidence set for 
the unknown signal was not considered before, which is why there are many 
quite natural and obvious extensions and generalizations of the present work. 
First, it is interesting to consider this problem for certain specific classes of 
distributions of the signal and noise, such as i.i.d. and Markov sources. For 
these classes of sources it should be possible to obtain rates of convergence in 
those statements that in this work are only asymptotic, for example in ^j. 

Second, a natural question is to find a construction of the confidence set for 
the cases where the signal is multi-dimensional. This is particularly important 
for applications, many of which are concerned with denoising such objects as 
photographs or video fragments. Another interesting generalization is the case 
where the alphabets are (subsets of), for example, the Euclidean space. This 
generalization can be also interesting from the practical point of view. Finally, 
the case where statistics of the noise and/or signal are unknown is obviously of 
great theoretical and practical interest. 
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